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fVj ' Abstract. We consider two strategies for sampling rows from m X n matrices Q with orthonormal 

columns. The first strategy samples c rows with replacement, while the second one treats each 

r~s ' row as an iid Bernoulli random variable, and samples it with probability 7 = elm. We derive 

*vj , several bounds for the condition numbers of the sampled matrices and express them in terms of the 

, coherence, /x, of Q. In particular, we show that for both sampling strategies the two-norm condition 

jH 1 number of the sampled matrix SQ is bounded by ^(1 + e)/(l — e) with probability at least 1 — 5 

^jj ' if c > 3mfie~^ ln(2n/<5). Numerical experiments confirm the accuracy of the bounds, even for small 

matrix dimensions. We also present algorithms to generate matrices with user-specified coherence, 
and apply the bounds to the solution of general, full-rank least squares problems with the randomized 
preconditioner from Blcndcnpik. A Matlab package, kappa^SQ, implements the matrix generation 

^Nj 1 algorithms and the two sampling strategies, and plots the condition numbers and the various bounds. 
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1. Introduction. Our paper was inspired by Avron, Maymouiikov and Toledo's 
Blendenpik algorithm and analysis [I]. 

Blendenpik solves least squares problems for full column rank matrices A with 

the Krylov space method LSQR [15j . The important part is the preconditioner. After 

^ ■ "smoothing out" A by a randomized unitary transform F, Blendenpik uniformly 

samples (i.e. selects) a small number of rows from FA, computes a QR factorization 

of this smaller sampled submatrix, and then uses the upper triangular factor, Rs, as 



^^ I a preconditioner for LSQR (the subscript s denotes quantities associated with the 

sampled matrix). 

^^ i The analysis in [H suggests that the preconditioned matrix ARj^ is well con- 

P\J ■ ditioncd, if FA has good "coherence". Simply put, the coherence of a matrix gives 

information about how "uniform" the elements in its orthonormal QR factor are. If 
FA = QR is a QR factorization, and if the columns of Q are close to columns of a 
permutation matrix then FA has high (bad) coherence. However if the columns of 
Q are close to columns of a Hadamard matrix, then FA has low (good) coherence. 



?H ■ 

j^ , In a matrix with good coherence, it should not matter which rows one samples, and 

any sampled submatrix is likely to have full rank. The randomized transform F is 
designed to improve the coherence of A. 

We were intrigued by the analysis of Blcndcnpik because it appears to be the first 
to exploit the concept of coherence for numerical purposes. We also wanted to get a 
better understanding of the condition number bound for ARj^ in [U Theorem 3.2], 
which contains an unspecified constant, and of the effect of other uniform sampling 
strategies. 
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1.1. Overview and main results. We start with a brief sketcli of the Blcndcn- 
pik algorithm (Section [5]), which solves a least squares problem with the precondi- 
tioned matrix ARj^. For LSQR to converge, ARj^ must be well conditioned, i.e. its 
two-norm condition number k{ARj^) w 1. The neat observation in [1] is to realize 
that sampling rows from FA amounts, conceptually, to sampling rows from an or- 
thonormal basis of FA. That is, if FA = QR is a thin QR factorization and 5 is a 
sampling matrix, then k(ARj^) ~ k{SQ), due to the unitary invariance of the two- 
norm. This means, it sufHces to consider sampling from matrices with orthonormal 
columns. 

We consider two different sampling strategies. The first sampling strategy (Sec- 
tion |3]) samples exactly the requested number c of rows, but with replacement: This 
means a row can be sampled more than once. This strategy is the EXACTLY(c) algo- 
rithm [HIIH] and seems to be the one that is analyzed in [T]. We show (Corollarv l3.1ip 

that the number of samples to achieve k{SQ) ~ k{ARj^) < \/ jz^ with probability 
at least 1 — (5 is 

c>l^\n{2n/6). (1.1) 

The threshold (jl.ip implies that our bounds are informative only for matrices that 
have sufficiently good coherence, and are tall and skinny (with many more rows than 
columns). 

The second strategy (Section |4]) treats each row as an iid Bernoulli random vari- 
able, and samples it with a specified probability. Hence no row is sampled more than 
once, but the exact number of sampled rows is not known in advance. This strategy is 
implemented in Blendenpik. We show fTheorem l3.13l and Corollarv l4.6p . that if each 
row is sampled with probability c/m, then Bernoulli sampling has the same bound 
as sampling with replacement. Hence sampling c rows with replacement seems to be 
comparable to sampling each row as an iid Bernoulli variable with probability c/m. 

A numerical comparison of sampling strategies and bounds is easier if one can 
produce matrices with specific coherence. We do this with an algorithm by Dhillon 
et al. [5] that constructs matrices with prescribed eigenvalues and diagonal elements 
(Section [5]). We also present specific classes of matrices that can assume any user- 
specified coherence and are faster to generate than running the algorithms. Please 
note that our goal here is only to analyze the effect of coherence, but not to con- 
sider transformations designed to improve coherence, such as subsamplcd randomized 
Hadamard ^U\ or Fourier |21] transforms. 

Since the sampling strategies can be expected to work well in the asymptotic 
regime of very large matrix dimensions, we present numerical experiments on matrices 
of small dimension (Section |6]) . All experiments were performed with the Matlab 
package kappaSQ |12j . It implements the two sainpling strategies, the different matrix 
generation algorithms, and plots condition numbers and the various bounds. The 
experiments illustrate that the sampling strategies produce well conditioned matrices, 
even for small values of c, when the bounds are not informative. The experiments 
confirm the accuracy of our bounds, and the validity of the threshold ([Lll for both 
sampling strategies. 

1.2. Literature. Existing randomized least squares methods are based on ran- 
domized projections. That is, they multiply A by some type of random matrix F., 
and then sample a small number of rows from FA. 
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The algorithms in [31 [71 [5] solve a smaller sampled problem by a direct method. 
Like Blendenpik^ the algorithm in [17| computes a preeonditioner from the QR fac- 
torization of a sampled submatrix, but then solves the preconditioned problem by 
applying the conjugate gradient method to the normal equations. The parallel solver 
LSRN [M] computes a preeonditioner from the SVD of a sampled submatrix, and 
then solves the preconditioned problem with an iterative method. This solver is more 
general than the others, because it applies to general rather than just full column 
rank matrices. 

As for randomized algorithms in general, the excellent surveys (TUl US] provide 
clear analyses and good intuition. 

1.3. Notation. The two- norm condition number of a to x??, matrix Z of rank(Z) = 
n is denoted by k{Z) = \\Z\\2\\Z^\2, where Z''' is the Moore-Penrose inverse. 

If Z is a TO. X n matrix, m > n, then Z = QR is a thin QR decomposition, if Q is 
m X n with orthonormal columns and R is n x n upper triangular. 

The k X k identity matrix is denoted by Ik = (ei . . . e^j . 

2. The Blendenpik Algorithm. The Blendenpik algorithm [H Algorithm 1] 
solves full column rank least squares problems with the Krylov space method LSQR 
|15j and a randomized preeonditioner. 

Algorithm 1 Sketch of Blendenpik [1] 

Input: Real rax n matrix A with rank(j4) ~ n, real m x 1 vector b 

Real m x m nonsingular "smoothing" matrix F 
Output: Solution of miua; II ^x — 6|| 2 

1; M ~ FA {Improve coherence} 

2: Ms = SM {Sample for preeonditioner} 

3; Thin QR factorization Ms = QsRs {Generate preeonditioner} 

4: Determine solution y to niin^ || Ai?J^z — b\\2 {Solve preconditioned problem} 

5: Solve RsX = y {Recover solution to original problem} 



2.1. Auxiliary results. Our condition number bounds are based on probabilis- 
tic bounds for the eigenvalues of Q'^S'^SQ. The following result makes the connection 
from eigenvalues to condition number of SQ. 

Lemma 2.1. Let Q be a mx n matrix with orthonormal columns, and S a kxm 
matrix with k > n. If WQ'^S'^SQ — a/„||2 < cte for some a > and < e < 1, then 
rank(5Q) = n and 



Proof. Let Ai(Q^S'^S'Q) > • • • > A„(Q^S'^S'Q) be the eigenvalues of the n x n 
matrix Q^ S^ SQ. Eigenvalue perturbation bounds for Hermitian matrices imply [51 
Corollary 8.1.6] 

\Xj{Q^S^SQ) - a\ < WQ^S^SQ - a/„||2 < ae, 1 < j < n. 

Since k > n by assumption, the k x n matrix SQ has singular values aj{SQ) = 
y/XjWWSQ), I < j < n. Hence \c7jiSQ)^ - a\ < ae. Together with ae > 
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this implies < ^a(l - e) < aj{SQ) < ^a(l + e). Thus rank(S'Q) = n, \\SQ\\2 < 
Vail + e) and ||(^Q)t||2 < l/^a{l-€). D 

The next result leads to the condition number of the preconditioned matrix for 
the least squares problem. 

Lemma 2.2. In AlgorithmUl if ra.nk{Ms) = n, then k{ARj^) = k{SQ). 

Proof. From FA = QR and the fact that the 2-norm is invariant under premul- 
tiplication by matrices with orthonormal columns, it follows that 

k{AR-^) = n{MR-^) = k{RR-^) = k{R,R-^) = k{M,R-^) = k{SMR~^) 
= k{SQ). 



2.2. Coherence. In what follows we consider two different randomized sampling 
matrices S for line 2 of Algorithm [1] and derive probabilistic bounds for k{SQ) = 
k(ARj^). An important ingredient in these bounds is the coherence of the matrices 
Q and M. 

Definition 2.3 (Definition 3.1 in [T], Definition 1.2 in [3]). // M is a m x n 
matrix of rank(M) = n with thin QR decomposition M = QR, then we define the 
coherence of Q and M as 

/x= max WelQWl (2.1) 

Note that — < /i < 1 . There are other definitions of coherence that differ from the 
above by factors depending on the matrix dimensions |16[ Definition 1], |19[ Definition 

!]■ 

In the following sections we derive bounds similar to [1] Theorem 3.2], but with 
all constants specified, for two different uniform sampling strategies. Throughout the 
paper we use the following assumptions based on Algorithm [T] 

Assumptions 2.1. The matrix M = FA is mxn with m > n and rank(M) = n. 
R has a thin QR decomposition M = QR, where Q is m x n with Q^Q = /„ and R 
is n X n upper triangular. The coherence of Q and M is jj. ^ maxi<fe<„i |lej(5||2. 

If S is a k X m sampling matrix with k > n, then Ms = SM has a thin QR 
decomposition Ms = QsRs where Qs is k x n with orthonormal columns and Rs is 
n X n upper triangular. 

3. Sampling with replacement. The first sampling strategy, presented as Al- 
gorithmic! is the EXACTLY(c) algorithm [51 Algorithm 3] which is also used in the 
BasicMatrixMultiplication Algorithm [6l Fig. 2]. It samples exactly the requested 
number of rows, but with replacement. 

We present four different bounds for k{SQ) and k{ARj^) when S is produced 
by Algorithm [21 as well as a comparison among the first three bounds (Section 13. 4p . 
Assumptions \TJ\ hold for all subsequent bounds. 

3.1. First bound. This bound is based on a probabilistic two-norm bound for a 
Monte Carlo matrix multiplication algorithm that samples according to Algorithm [2l 

Theorem 3.1 (Theorem 4 in [8]). Let B be a m x n matrix, m > n, with 
\\B\\2 < 1 and ||i?||^ > 1/24. Let the probabilities in Algorithm\^ satisfy 

P.>/3^, l<k<m, 



Algorithm 2 EXACTLY(c) [8] 



Input: Integers m> 1 and 1 < c < m 

Probabilities pk > 0, J2k=i Pk = 1 
Output: c X m sampling matrix S with £[5-^5] = Im 

S = {Initialize} 

for t = 1 : c do 

Sample kt from {!,..., m} with probability pk^ 
independently and with replacement 

end for 



for some < /3 < 1. Given < e < 1 and < 5 < 1, let c be an integer so that 

m\B\\i 



>ClnK/v5), where C 



Pe^ 



If S is a ex m matrix produced by Algorithm\^with the above pk, then with probability 
at least \ — 5 

\\B^B-{SBfiSB)h<e. 



Theorem 3.2 (First bound for Algorithm [2]) . Given < e < 1 and < 6 < 1, 
let c be an integer so that 



lin -^ n, C In ( (/vS J > < c < m, where ^ 



^Qruji 



If S is a ex m matrix produced by Algorithm\^ with uniform probabilities pk = 1/m, 
1 < k < m, then with probability at least 1 — S, we have rank^SQ) = rank(Ms) = n 
and 



K(5Q) = Ac(Ai?7i)<^/i±^. 



Proof Apply Theorem 13.11 with B = Q and (3 = IMki , From the definition of 
coherence (|2.ip follows 






and 



96||g|||, _ 96m/i 



Thus, the conditions of Theorem 13. II are fulfilled, and with probability at least 1 — (5, 
||/„ - Q^S'^SQ\\2 < e. The resuh now follows from Lemmas O and [2^ D 

Theorem 13.21 suggests that Algorithm [2] needs to sample more rows, if Q and M 
have high coherence, if SQ and AR^^ are to be well conditioned, or if the condition 
number bound is to hold with high probability. 

Remark 3.3 (Limitations, and sampling). 
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• Theorem \3.2\ holds only for matrices of sufficiently low coherence since the 
above restrictions on c imply ^ < e^/96. 

• Even with minimal coherence ^ — n/m, satisfying the restrictions on c re- 
quires 

96n , /96n 



That is, the bound in Theorem \3.S\ is informative only if the number of rows 
is sufficiently large compared to the number of columns, even with minimal 
coherence. 
• Theorem \S.2\ implies that, in the best case, the minimal number of samples is 

n „„, / 96n 
c > — 96 In 



e2 \e^V6 

This follows from fi > n/m. 

3.2. Second bound. This bound is based on a Frobenius norm bound for a 
Monte Carlo matrix multiplication algorithm that samples according to Algorithm [21 

Theorem 3.4 (Theorem 2 in [6]). Let B be an m x n matrix, with m> n, and 
let the probabilities in Algorithmic satisfy 

..>/3^, l<k<m, 

for some 0</3<l. If S is a c x m matrix produced by Algorithmic with the above 
Pk, then for any < 6 < 1 with probability at least 1 — S, 



\\B-B - iSBnSB)y < l + V(8/^log(l/^) „^|,^, 

Theorem 3.5 (Second bound for Algorithm [2) . Given < S < 1 and c> n, 
let 



Let S be a ex m matrix produced by Algorithmic with uniform probabilities pk = 1/m, 
1 < k < m. // £2 < 1; then with probability at least 1 — 5, we have vank^SQ) = 
rank(Ms) — n and 



k{SQ) = k{AR-^) < 



-l^ - /I + £2 



l-e2' 



Proof Apply Thcorcm l3.4l with B = Q and (3 — "^"-^ . The definition of coherence 
(j2.1[) implies 



\Q\\f TTi M ~ m 



/3E^^^^^A-m< J_=p,, l<k<m. 



Thus, the conditions of Theorem l3.4l are fulfilled, and with probability at least 1 — 5, 
||/„ - Q^S^SQh < Win - Q^S^SQWp < £2, where e^ = ^+V(s/g]°g(V^) ^ ^he 
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condition number bound follows from Lemmas 12.11 and 12.21 and the expression for £2 
follows from the above choice of /?. D 

This bound suggests that SQ and ARj^ are well conditioned, if Algorithmic] 
samples many rows and if Q and M have low coherence. Since Algorithm [2] samples 
with replacement, £2 > for c ~ m. However, £2 decreases with increasing amount of 
sampling. 

Remark 3.6 (Limitations, and sampling). 
• From £2 < 1 follows that Theorem \3.5\ holds for m,atrices whose coherence is 
bounded by 




H<-:—\\U- — + 1-1 , where i = 8ln{l/S). 

Mm 

Theorem \3.5\ implies that the number of samples required for rank(S'(5) = 
rank(Afs) = n and k{AR~^) < \/ j^ to hold with probability at least 1 — 5 is 



c 



mil 



> -^ {Vn+ y/mfi. 



2 



£^ 

Since /i > m/n, in the best case the minimal number of samples is 

£2 



2 

c>^ (1 + v/8 Hl/S) 



Even with minimal coherence ji = n/m, assuring that £2 < 1 and c < m (for 
otherwise a deterministic algorithm is as fast) implies 



m> — [l + ^8\n{l/S) 



That is, the bound in Theorem \3.5\ is informative only if the number of rows 
is sufficiently large compared to the number of columns, even with minimal 
coherence. 

3.3. Third bound. This bound is based on the noncommutative Bernstein in- 
equality, specialized to square matrices. 

Theorem 3.7 (Theorem 4 in J16jl. Let X^ be independent random nxn matrices 
with B[Xk] =0,1< k<m. Letpk = max{|lE[XfcXj]||2, \\I][X^Xk]h} and \\Xkh < 
T. Then for any e > we have |1X]a;=i "^^^112 — ^ with probability at least 

1 - 2nexp ( -| -^:^, ■ 1 . 

Theorem 3.8 (Third bound for Algorithm [2|). Given c>n and Q < e <1, set 
(5 = 2n exp — ; 



?71/.i (3 + I 

Let S be amxm matrix produced by Algorithm\^ with uniform probabilities pk = l/m, 
1 < k < m. If 6 < 1 then with probability at least 1 — S, we have rank(S'(5) = 
rank(A/s) = n and 



-(Ai?-)<\/^. 



Proof. The proof is similar to that of [H Lemma 3]. Represent the outcome of 
sampling with Algorithm H by Q'^S'^SQ = ELi^*' '^here Yt = f Q'^ ek^el^Q are 
n X n matrices, 1 < t < c. The zero mean versions are Xj = Yj — -In- To apply 
Theorem 13.71 to the Xt we need to verify that they fulfill the required conditions. 
First, by construction, E[Xt] = 0, 1 < i < c. Second, since Yt and /„ are symmetric 
positive semidefinite, 

\\Xth < max{||y,||2,||i/„||2} = -ma^{m\\elQ\\l,l} < ^, 

c c 

where the last inequality follows from the definition of /i and /i > n/m. Third, since 
Xt is symmetric, 

Xj'Xt - XtX^ = ^' = ^ (^" - 2mQ^ek,elQ + w? \\elQ\\l Q^ ekA.Q) , 
so that 

E[Xf] = 1 l-In + mf2\\elQ\\lQ'^e,elQ] . 

Positive semidefiniteness and the definition of ^j. give, as above, 

|lE[X,2]|l2<lmax{l,mA.} = ^. 

Applying Theorem 13.71 with r ~ mfi/c, and pt = nifi/c^ shows that || J2't=i ^tlU < e 
with probability at least 1 — 6. In this context we have 

c c 

Y^Xt=Y,{Yt- \ln) = {SQf{SQ) - /„, 
t=l t=l 

so that WiSQY" {SQ) — /„||2 < e. The condition number bound follows from Lemmas 
EIlandEia D 

Theorem 13.81 suggests that the condition number bound holds with high prob- 
ability, if Algorithm [5] samples many rows, and if Q and M have low coherence. 
Again, since Algorithm [2] samples with replacement, (5 > for c = m, but the failure 
probability goes to zero with more sampling. 
Remark 3.9 (Limitations, and sampling). 
• Theorem \3.8\ holds only for matrices of sufficiently low coherence since ei < 1 
implies 

3 c 



8 m \n{2n/S) 
Even with minimal coherence ^l = n/m, assuring that ei < 1 requires 

Q 

— n\n{2n/5) < c <m, 
o 

where the amount of sampling is c < m, for otherwise a deterministic algo- 
rithm is as fast. That is, the bound in Theorem \3.8\ is informative only if the 
number of rows is sufficiently large compared to the number of columns, even 
with minimal coherence. 



• Theorem \3.8\ imvlies that the number of samples required by Algorithm\^to 
achieve k{SQ) = k{AR^^) < \/jz^ with probability at least 1 — (5 is 



mil 
3 l2 



c>^^ M2n/S). 



Since fi > n/m, in the best case the number of samples is 

c>^\H2n/5). 
e^ 3 

Below wc bring Theorem 13.81 into the same form as Theorem 13. 5[ with a focus 



on e. 



Corollary 3.10 (Ahernative expression for third bound for Algorithmic]). 

2 



Given c> n and 0<5<l,let£^-:^ ln{2n/5) and 



ei 




Let S be a m X m matrix produced by Algorithm\3[ If ei < 1 then with probability at 
least 1 — (5, we have rank(S'(5) = rank(A/s) — n and 



k{SQ) = k{AR-^) < 



-u^ /l + ^i 



1-ei 



3.4. Comparison of bounds for Algorithm [2], We compare the three previ- 
ous bounds based on the number of samples required by Algorithm [5] to achieve full 
rank for the sampled matrices and a specified bound on the condition number, with 
a desired success probability 1 — S. The bound that results in the smallest number of 
samples c is judged to be the best. 

Corollary 3.11 (Threshold for amount of sampling). For a given < e < 1 
and < S < I, the bound for c in Remark \3.9\ is smaller than the bounds in Remarks 

\Mand\3M 

Thus, given < e < 1 and < S < 1, the number of samples required by 

Algorithm\^to achieve k{SQ) = k{ARJ^) < \/ jz^ with probability at least 1 — S is 
at most 

c>'-^H2n/S). 
3 e^ 

The experiments in Sections 16.31 and [6.41 confirm the conclusion in Corollarv l3.11l 
They illustrate that Theorem 13.81 is tighter than Theorems 13.21 or 13.51 and that the 
threshold for c is realistic. 

3.5. Fourth Bound. This bound is based on an eigenvalue bound for sums of 
random Hermitian positive semidefinite matrices. 

Theorem 3.12 (Corollary 5.2 in [21]). Let Xk be independent random n x n 
Hermitian positive semidefinite matrices with \\Xk\\2 < t. 1 < k < m, and define the 
eigenvalues 

/ m \ 




labelled so that wi > . . . > a;,i . Then for any < t < 1 



Pr 



A„ |X!^^ I - (1-*)^" 



<ji(e-*(l-t)-(i-*))' 



and for any t > 



Pr 






<n(e*(l + i)-(i+*)) 



uji/t 



Theorem 3.13 (Fourth bound for Algorithm [5|) . Given m > n, c > n, and 
< € < 1, set 



^.n((e-(l-.r^-^))^'^'"^V(e^(l + .r(^-)) 



c/{mfi) 



Let S be a m X m matrix produced by Algorithm\^ If S < 1, then with probability at 
least 1 — 5, we have rank(S'Q) = rank(A/s) = n, and 



1 



KiSQ) = k{AR-^) < 



Proof. The proof is somewhat similar to [20l Lemma 3.4]. Set Xt = —Q^ekt^]- Qi 
1 < i < c. Then Xt is n x n Hermitian positive semidefinite, ||-'(^t||2 < "lle^'QIli ^ 
^, so that T = mfi/c. Furthermore, E[Xt] = i J2lLi Q^SkslQ == i/„, so that 

ujj - A J ^ E[Xt] J = Aj (/„) = 1, 1 < j < n. 

Applying Theorem [331 to ELi ^* = Q'^S'^SQ gives 

Pr [A„ (Q^5^5g) < 1 - 6] <n (e'^ (1 - ,)-(i-^)y^*"'^^ 

Pr [Ai {Q^S^SQ) > 1 + e] < n (e^ (1 + e)-(i+^)) '^^'"''^ . 

The result now follows from Boole's inequality [181 P- 16] > a-nd Lemmas 12.11 and 12.21 
D 

In Section [6.31 we present experiments to compare Theorem 13. 131 to the preceding 
bounds. The experiments suggest that Theorem 13. 131 is as tight as Theorem [ 



4. Bernoulli sampling. The second sampling strategy, presented as Algorithm[3l 
is implemented in Blendenpik [1] Algorithm 1]. We call it Bernoulli sampling, be- 
cause it treats each row as an independent, identically distributed Bernoulli random 
variable. Each row is either sampled or not, with the same probability for each row. 

The input variable 7 controls the expected number of sampled rows. Note that 
Algorithm [3] produces a square matrix S, in contrast to Algorithm [2] which produces 
a, c X m matrix S. 

We present two bounds for k{SQ) and k{ARj^) when S is produced by Algo- 
rithm [3| Assumptions [5T] hold for all subsequent bounds. 
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Algorithm 3 Bernoulli sampling [T] 



Input: Integer m > 1, and 7 with < 7 < 1 

Output: m X m samphng matrix S with E[S'-'"5'] = 7/,, 

5" = {Initialize} 

for t ~ 1 : m do 

J 1 with probability 7 
I with probability 1 — 7 
end for 



4.1. First Bound. This bound is again based on the noncommutative Bernstein 
inequality in Theorem 13.71 

Theorem 4.1 (First bound for Algorithm |3]) . Given m>ri, 0<7<1 and 
< e < 1, set uj = e//i and 



S = 2nexp [ -? 



" Sm(l — 7) + max{l — 7, 7} a;/ 

Let S be a m X m matrix produced by Algorithm\^ If S < 1 then with probability at 
least 1 — 6, we have rank(S'Q) = rank(A/s) = n and 



-U ^ '1 + ^ 



KiSQ) = fiiAR^') < u i_g 

Proof. The proof is similar to that of [H Lemma 3] . Set 

V f(l-7)(Q'^efee^Q) with probability 7 ^^^^ 

I ^liQ ^k^k Q) with probability 1 — 7 

To apply Theorem 13.71 to the Xk we need to verify that they fulfill the required 
conditions. First E[Afc] = 0, for 1 < fc < m, by construction. Second 

i|A,||2<max{(l-7)||e|Q||2,7||e^Q||2}<max{l~7,7}^. 

Third, since the X^ are symmetric, 

E[AfeAf ] = E[AjAfc] = E[Xl] = 7(1 - 7) 116^^112 Q^e^elQ^, 1 < k < m. 

The definition of /i implies ||E[A^]||2 < 7(1 — 7)/i^, 1 < k < m. Applying Theorem l3.7l 
with r = max{l — 7,7}/! and pk = 7(1 — 7) fp shows that || X^fcLi^felU ^ £ with 
probability at least 1 — 5, where 

6 — 2n exp 



^ 2p, (m 7(1 — 7)/i + max{l — 7, 7}e/3) J 
In this context we have 

m m 

Y.Xk = {SQf{SQ) - 7 ^ Q^ckelQ = {SQf{SQ) - 7/™ 

k=l fe=l 

11 



As in the proof of Thcorcm l3.2l wc conclude that ||(S'Q)-^5Q— 7/„||2 = II X^a-Li -'^felb £ 
£, 1 < j < n. Now set e ~ je, and apply Lemmas 12. II and 12. 21 D 

Remark 4.2. The success probability 1 — S in Theorem \3.5\ is an increasing 
function of 7 and 10 . 

• The scalar 7 represents the amount of sampling. If j ~ 1 then S ~ I„ 
all rows are sampled. 
For i < 7 < 1 



^771 ? 



S = 2n cxp ' - - 



9 

70;^ 



3to (1 — 7) + "fu) 



while for < j < h 



S — 2n cxp 



9 

7 CJ^ 



1 — 7 3m + Lo 



Hence 1 — S is an increasing function in 7. This means, the more rows are 
sampled the higher the probability that k{SQ) and k{ARJ^) satisfy the bound 
in Theorem \3.5\ 
• The scalar lo = e//i represents a trade off between error and coherence. 
The above expressions for 5 show that 1 — 5 is an increasing function in uj. 
That is, if we fix the error e, and consider matrices with better coherence (i.e. 
smaller ^), then the probability increases that k{SQ) and k{ARj^) satisfy the 
bound in Theorem \3.5\ On the other hand, if we fix the coherence, but relax 
the error requirements (i. e. larger e) then the probability also increases. 
Below we bring Theorem 14.11 into the same form as Theorem 13.51 
Corollary 4.3 (Alternative expression for first bound for Algorithm |3|). 
Given tti > 71, < 7 < 1 and < 6 < 1, let i ^ ^ \n{2n/S) and 




i,^fif^i+ L^l2mi + ^H^ ^ ,.-7 -M ^ 

Let S be a m X m matrix produced by Algorithm\3[ // 63 < 1 then with probability at 
least 1 — 6, we have rank(iS'(3) = rank(Ajrs) = n and 



Corollary shows that 63 — !> as 7 — )■ 1, which means the condition number is 
guaranteed to approach 1 as the probability of sampling approaches 1. This is in 
contrast to the bounds for Algorithm [5] in Section [3] 

In order to compare the sampling strategy in Algorithm [3] with the one in Algo- 
rithm [2] we set 7 so that the expected number of sampled rows is equal to c, and we 
assume that the number of sampled rows is much less than the number of available 
rows. 

Remark 4.4 (Limitations, and sampling). Given n < c < m/2 and < S < 1, 
let£=l ln{2n/6) and 



fim 



(i+VT2d+py (4.1) 



Let S be a m X m matrix produced by Algorithm\^ with 7 = c/m. 
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If e^ < 1; then with probability at least l — S, we have Tank(SQ) — rank(A/s) 
n and 



Theorem \4-l\ implies that Algorithm [^ with 7 = c/m and c < m/2 achieves 
k{SQ) = k{ARJ^) < \/ jz^ with probability at least 1 — S if 



c>-—^ ln(2nM). 



2 TO/i 

3 I2 

This bound suggests that Algorithm\^with 7 = c/m and small c is comparable 
to Algorithm\^ 
• Even with minimal coherence fi — n/m, assuring that e < 1 requires 

2 

c> - nln{2n/5). 
o 

Since by assumption c < m/2, this means m > -^ nh\{2n/5). That is, the 
bound OTTp is informative only if the number of rows is sufficiently large 
compared to the number of columns, even with minimal coherence. 

4.2. Second Bound. This bound is again based on an eigenvalue bound for 
sums of random Hermitian positive scmidefinite matrices. 

Theorem 4.5 (Second bound for Algorithmic]). Given m> n, < ^ < 1 and 
< e < 1, set 

S^n[^{e-Hl-e)-i^-^Y\{eHl + e)-^^^^^fy 

Let S be a m X m matrix produced by Algorithm\^ If S < 1, then with probability at 
least 1 — 6, we have rank(S'(5) = rank(A/s) = n, and 



Proof. The proof is somewhat similar to [2D] Lemma 3.4]. Set 

iQ'^ekelQ with probability 7 ^ ^, ^ 

-^k =■( . .. , 1 < K < m. 

I with probability 1 — 7 

Then Xi^ is n x n Hermitian positive scmidefinite, ||Xfc||2 < llGfeQIll — /^j ^^ that 
T = fi. Furthermore, since Xk is a Bernoulli random variable, E[Xfc] = jQ^ekcjQ, 
so that 



A, (^E[Xfe]j ==A,(7/„)-7, 1<J< 
Applying Theorem [Sm to J2T=i ^k = Q'^S'^SQ gives 



^3 

\k=l 



7/Ai 



Pr [A„ {Q^S^SQ) < (1 - e) 7] <n (e"^ (1 - e)-(i-^)) 

Pr [Ai (Q^S^SQ) > (1 + 6) 7] < n (e' (1 + 6)-(i+')' '^' 
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The result now follows from Boole's inequality [THl p. 16], and Lemmas 12.11 and 12.21 
D 

If we choose c/m for the expected number of sampled rows by Algorithm [3l 
then Theorem 14.51 gives exactly the same bound as Theorem 13.131 for sampling with 
Algorithm [2] 

Corollary 4.6 (Second bound for Algorithm [3]). Given m > n, c> n, j = c/m 
and < e < 1, set 

S . n ((e-^ (1 - e)-i^-^f"'' + {e^ (1 + e)-i^-^f'''^ . 

Let S be a m X m matrix produced by Algorithmic If S < 1, then with probability at 
least 1 — 5, we have rank(S'Q) = rank(A/s) = n, and 



KiSQ) = k{AR-^) < '^^^ 



In Section 16.31 we present experiments to compare Corollaries 14.61 and 14.31 The 
experiments suggest that Corollary 14.61 is much tighter for sampling probabilities 
7 — c/m with c < m/2. This is also confirmed by the experiments in Section 16. 4[ 
which compare Corollary 14. 61 against actual condition numbers. 

Corollary 14.31 only becomes tighter for large sampling amounts c ~ m, because 
£3 — > 1 as 7 — > 1. 

5. Algorithms for generating matrices with specific coherence. In order 
to test the bounds in Sections [3] and IH we need to generate matrices with specific 
coherence. 

We first show that such matrices always exist (Section [5J|) . Then we present al- 
gorithms that generate these matrices (Section [52]), and also two classes of structured 
matrices that can take on any specified coherence (Section 15. 3p . At last we discuss 
the numerical computation of e in Theorem 13. 131 and Corollary 14.61 (Section F 



5.1. Existence. We show that for given integers m and n with 771 > n > 1, 
and a desired coherence /i, there always exists a m x n matrix U with orthonormal 
columns so that fi{U) = /i. To this end we use the fact that the diagonal elements of 
a Hermitian matrix majorize its eigenvalues. 

Definition 5.1 (Definition 4.3.24 in [H]). Given real vectors a and A with 
elements ai < ■ ■ ■ < Um, and Ai < • • • < Am- Then a majorizes A if 

k k mm 

^Aj<^aj, l<fc<m-l, ^Aj=^aj. 

Theorem 5.2 (Theorem 4.3.32 in [H]). Let a and A be vectors with elements 
oi < ■ ■ ■ ^ dm and Ai < • • • < Am- If a majorizes A then there exists a m x m 
Hermitian matrix W with diagonal elements Wjj = Uj and eigenvalues Xj, 1 < j < 
m. 

We use this theorem to show that there always exists a matrix with orthonormal 
columns that has prescribed row norms, and in particular, prescribed coherence. 

Theorem 5.3 (Existence of matrices with specified coherence). Given integers 
m and n with m > n > 1, a real number fiQ with n/m < /io !i 1. ^.''^d a vector 
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a with elements < ^i < • ■ ■ < Qm-i !i Qm — Mo ^^^^^ S^i ^i ~ '^- ^^en i/iere 
exists a m X n matrix Q with orthonorm,al colum,ns that has eoherence fi = /io and 
WejQWl^a,, l<j<m. 

Proof. Let A be a vector with Xj = for 1 < j < m — n, and Xj — 1 for 
771 — 71 + 1 < j < m. We first show that a majorizes A. Since aj > 0, 

fe fe 

> Xj = < y Qj, 1 < k < m — n. 

Since aj < 1 and X]7=i % "= "■' 

> Xj~k = n— [n — k)<n— > '^j == / Q^j? 1 < A: < n. 

j = l j=m-n+k+l j = l 

Thus a majorizes A, and Theorem 15.21 imphes that there exists a Hermitian matrix 
W with Wjj = Qj and eigenvalues Xj, 1 < j < m. 

Since W is Hermitian with n eigenvalues equal to one, and all other eigenvalues 
equal to zero, it has an eigenvalue decomposition 

\ AT 



where Q is m x m real orthogonal, and Q = Q (O In) has n orthonormal columns. 
From a being the diagonal of W follows Qj ~ Wjj ~ ejQQ'^ej = ||e]"Q||2 and 

/«o = a,n = ||eJjVK||^ = maxi<j<,„ ||ejiy||^ = ^. D 

5.2. Algorithm. We use a transposed version of [Sj Algorithm 3] to generate 
matrices with orthonormal columns that have specified row norms, and in particular, 
specified coherence (the corresponding implementation is part of |12|). Theorem 15.31 
assures that this is possible if the majorization conditions are satisfied. 

Algorithm 4 Generating a matrix with specified coherence [5] 

Input: Integers m and n with ?7i > tt, > 1, real number /ig with n/m < ij-q < 1 

Vector a with elements < ai < • • • < &,„ — /io and ^ 
Output: 777 X 77 matrix Q with Q^Q = /„, coherence /i = /iq, 

and lle^^QJIj = aj, 1 < j < m 



j=i "J 



a," = 77 



9: 
10: 



Set Q = (In 0„x()ri-n)) {Initialization} 

while not finished do 

Find indices i < j with 

WefQWl < a^, IJejQIli > aj, and |!e^Q||i = a^ for i < fc < j 

ifa.-||efQ||i<||ejQ||i-a, then 

Compute G to rotate rows i and j so that IJe^GQIJj = a^ 

else 

Compute G to rotate rows i and j so that ||e!^G(5||2 = aj 

end if 

Set Q := GQ {Update} 

end while 
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The rotations G in Algorithm|3]are mxm Givcns rotations that rotate rows i and 
j, and are computed from numericahy stable expressions [51 section 3.1]. Algorithm^] 
requires at most m — 1 rotations. Since each rotation changes only two rows, this 
makes 0{mn) arithmetic operations. 

An obvious choice for the input a is the vector with elements Qj = ^_^° , 1 < 
j < m — 1, and am = Mo- If minimal coherence ^ = n/m is desired, then this choice 
reduces to Qj = n/m for all j. 

Adversarial algorithm. Since the outcome of our experiments depends highly on 
the particular type of matrix, we take a worst case approach and create matrices 
that try to defeat the sampling strategies in Algorithms [2] and [S] We do this by first 
creating a matrix Qs with specified coherence but very few rows, and then we pad Qs 
to expand it to the desired dimension. 

Algorithm 5 Generating a "bad" matrix with specified coherence 

Input: Integers m and n with m > n > 1, real number fxo with n/m < ^o !i 1 

Output: m X n matrix Q with Q^Q = /„, coherence /z = /zq, and many zero rows 



Set 771s := T'T'/moI {Number of nonzero rows} 

Use Algorithm U to compute a nig x n matrix Qs with coherence /xq 

Set Q = [Oi On,m-ms) {Pad with m — rrig zero rows} 



5.3. Structured matrices with specified coherence. We present two classes 
of structured matrices that have specified coherence. These matrices have the advan- 
tage that they can be generated faster than running Algorithm 21 but they have 
limitations due to their structure. 

The matrices Q below are mxn with orthonormal columns and coherence fi — ^q., 
where n/m < /xq < 1. 

Stacks of diagonal matrices. The number of rows must be a multiple of n. Let 
m/n > 1 be an integer. Set 



Q^ 






m -1 ■ 



Matrices with Hadamard Structure. These matrices are strictly rectangular, and 
the number of rows must be a power of two. Let m ~ 2^ for some integer fc > 1, and 
n < m. Set 



\ 



Mo 



m — 1 



rn — 1 



P- 



TO — 1 



For < j < fc - 1 let 



and 



Di 



Bo = 13, 



a -13 
P a 



B 
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-B'j Bj 

B^ Bn 



D, 



. -Bj 
Bi D, 



Set Q = Dk r^ 

Note that only the final matrix Q has orthonormal columns and coherence fj,; the 
intermediate matrices Bj and Dj do, in general, not. 

5.4. Determining e in Theorem 13.131 and Corollary 14.61 In order to com- 
pare Theorem 13.131 and Corollary 14.61 to Corollaries 13.101 and 14.31 we need to express 
e as a function of the other parameters. That is, given S, m, n, fi, and 7 = c/m, we 
compute e so that /(e) = 6, where 



fit) 



((i-(i-i)-^-*))'''%(t'(i+r'^+*')'''). 



Since an explicit expression seems out of reach, we use unconstrained nonlinear opti- 
mization (a Nelder-Mead simplex direct search) to solve mint (^ — fifiiY' in Matlab. 
This is presented as Algorithm IH] below, and is part of P^ . 

Algorithm 6 Computation of e in Theorems 13. 131 and 14. 5[ and Corollary 
Input: Integers m and n with \ < n < m 

Real number 7 with < 7 < 1 

Real numbers // with n/m < fJ. < 1, and 6 with < 6 < 1 
Output: Real number < e < 1 so that /(e) = S 



Compute e = fminscarch{{6 - /(t))^ 0, lO^^o 5) ^^ Matlab 
if < e < 1 and |(5 - /(e)| < 10~^^S then 

e found 
else 

e does not exist 
end if 



The arguments of fminscarch in line 1 of Algorithm [5] are the objective function 
{S — /(i))^; the starting value (chosen so that fminsearch does not get stuck in 
flat parts of the objective function); and the termination tolerance 10~'^°5. If the 
conditions on line 2 of Algorithm |6] are not met, then the bounds in Theorems 13.131 
or 14.51 give no information. Note that these bounds hold only for matrices with 
sufficiently low coherence and sufficiently more rows than columns. 

6. Experiments. We present numerical experiments for matrices of small di- 
mension. After a brief discussion of the Matlab code (Section 16. ip . we make the 
following comparisons for the sampling strategies in Algorithms [2] and [3] Algorithms 
|4] and [5] for generating matrices (Section [O]) : the different bounds in Sections [3] and |4] 
(Section 16. 3p : and the best bounds and actual condition numbers (Section 16. 4p . 

6.1. Matlab code. All numerical experiments were performed with the Matlab 
package kappaSQ [12] . It implements the sampling strategies in Algorithms [2] and [H 
the different matrix generation algorithms from Section [5] and their failure rates, and 
plots condition numbers and the various bounds from Sections [3] and [51 To use the 
package, simply enter run_kappaSQ in Matlab and follow the on-screen prompts. 

6.2. Matrix generation. We compare the matrix generation strategies in Al- 
gorithms [31 and [51 when sampling with Algorithms [51 and [31 
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Fig. 6.1. Condition numbers and failures for matrices SQ, sampled with Algorithmic versus 
amount of sampling, c. Here m = 10,000, n = 5 and fi = .05. Left panels: Q from Algorithm \4\ 
Right panels: Q from Algorithm\5\ Upper panels: Condition numbers k(SQ) . Lower panels: Failure 
rates. 





Fig. 6.2. Condition numbers and failures for matrices SQ, sampled with Algorithm [Sl and 
sampling probability 7 = c/m, versus c = 7m. Here m = 10,000, n = 5 and fi = .05. Left panels: 
Q from Algorithm^ Right panels: Q from Algorithm\5\ Upper panels: Condition numbers k{SQ). 
Lower panels: Failure rates. 



Set up. Algorithms S] and [5] produce m x n matrices Q with m = 10,000, n = 5 
and ^ = .05 = 100 n/m. Algorithm [2] uses 54 values of c in the interval [?T.,rn]; while 
Algorithm [3] samples with probability 7 = c/m for the same c values. 

The plots in Figures l6.ll and 16.21 were produced from 30 runs. The upper panels 
show the condition numbers, k(SQ), versus the amount of sampling c, or c = jm. 
The lower panels show percentage of failure rates versus c, or c = jm. Failure here 
means that the sampled matrix is rank deficient. 

Sampling with Algorithmic In Figure [6TT1 all condition numbers k{SQ) < 10, 
but the matrices from the adversarial Algorithm [5] have consistently higher condition 
numbers, and higher failure rates for c < 4000. 

Sampling with Algorithmic In Figure W^ all condition numbers k{SQ) < 10, 
but the matrices from the adversarial Algorithm [5] have consistently higher condition 
numbers, and higher failure rates for c < 3000. Comparing plots in the same columns 
of Figures l6.1l and l6.2l suggests that sampling c rows with replacement (Algorithmic]) is 
about the same as Bernoulli sampling of rows with probability 7 = c/m (Algorithm[3]). 

Conclusions. 

1. There appears to be little difference between the two sampling strategies. 
Sampling c rows with replacement (Algorithm[2]) is about the same as Bernoulli 
sampling of rows with probability 7 — c/in (Algorithm [3|) . 

2. Sampled matrices of full rank tend to be well conditioned. 
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Fig. 6.3. Bounds for k{SQ), sampled with Algorithmic versus amount of sampling, c. Here 
m = 10,000, n = 5 and fi = 1.5 n/m. Left panel: Bounds from Theorem \3.S\ Theorem 15.51 
Corollarv \3.10[ and Theorem \3.13\ for n < c < m. Right panel: Bounds from Corollaries \4-3\ and \4-6\ 
forn<c< 1000. 



That is. we observed either failure {SQ is rank deficient) or else complete 
success {k{SQ) < 10), but little in between. 

3. The adversarial Algorithm [5] tends to produce matrices with slightly higher 
condition numbers than Algorithm |4l 

4. It is easy to generate matrices where efficient sampling (c < m/lO) fails. 
That is, Algorithm [5] produces a large proportion of sampled matrices that 
are rank deficient - even when the matrices are tall and skinny {m/n = 2000) 
and have moderately good coherence (/x = 100 n/m). 

In the subsequent experiments we take an adversarial point of view and generate 
matrices Q with Algorithm [U because the sampled matrices SQ are more likely to be 
rank deficient or have higher condition numbers. 

6.3. Comparison of bounds. Wc compare the condition number bounds for 
both sampling strategies: with replacement (Algorithm [5|) , and Bernoulli sampling 
(Algorithm [3]) , when they hold with 99.99 percent probability. 

Set up. The adversarial AlgorithmOproduces mxn matrices Q with m = 10, 000, 
n = 5 and ^ = 1.5n/m. Figures [^751 and W^ show the bounds for k{SQ) with 6 = .01, 
versus the amount of sampling c, or c = jm. The left panels show 58 values of c in 
the interval [n, m], while the right panels show n < c < 1000. 

Sampling with Algorithm [H We compare the bounds for sampling with replace- 
ment in Algorithmic! The left panel of Figure Wl^ shows the bounds for k{SQ) from 
Theorem 13. 2i Theorem 13. 5i Corollary 13.101 and Theorem 13.131 versus the amount of 
sampling, c. The bounds from CoroUarv 13.101 and Theorem 13.131 are almost identical. 
In contrast. Theorems 13.21 and 13.51 hold only for much larger values of c and produce 
worse bounds for k{SQ). 

The right panel of Figure 1^751 zooms into smaller values of c for the better bounds, 
in Corollary 13.101 and Theorem 13.131 Both bounds arc about the same, and sug- 
gest that Algorithm [2] can sample matrices SQ with condition numbers k{SQ) < 10 
for sampling amounts c ^ 150. This agrees with Corollary 13.111 If we use c > 
I ^ h-i{2n/5), with e = 913/101, so that ^(1 + e)/(l - e) == 10 then c > 144. 

Sam,pling with Algorithmic We compare the bounds for Bernoulli sampling with 
Algorithm [3] and sampling probability 7 = c/m. The left panel of Figure 16.41 shows 
the bounds for k{SQ) from Corollaries 14.31 and 14.61 versus c = jm. 

The right panel of Figure W^ zooms into smaller values of c. CoroUarv 14.61 holds 
for much smaller values of c than Corollary 14.31 and also produces lower condition 
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Fig. 6.4. Bounds for k{SQ), sampled with Algorithm\^and sampling probability 7 = c/m, from 
Corollaries \4- 3\ and \4-6] versus c = ■ym. Here m = 10,000, n = 5 and fj, = 1.5 n/m. Left panel: 
n < c < m. Bight panel: n < c < 1000. 



numbers. Corollary 14.31 becomes tighter only for large sampling amounts c — > tti, 
because £3 ^- 1 as 7 ^- 1, see Section 1411 

The bounds suggests that Algorithm |3] can sample matrices SQ with condition 
numbers k(SQ) < 10 with sampling probabilities 7 > c/m where c ^ 150, much like 
Algorithm [2] 

Conclusions. 

1. Sampling with Algorithm H Theorem [SlS f ^Corollarv lSTTOl and Theorem [3?T3l 
give the tightest bounds (which hold with 99.99 percent probability). 

2. Sampling with Algorithm[3j Corollary 14. 61 gives the tightest bounds for prac- 
tical values of 7 = c/m, where c ^ m/2. 

3. Corollary 13.111 predicts correctly the minimal amount of sampling required, 
for both Algorithms [2] and |3l 

That is, c > I ^ ln(2n/(5) samples produce well conditioned matrices SQ, 
with condition numbers k{SQ) < 10. 

6.4. Bounds versus actual condition numbers. We compare the best bounds 
in Sections [3] and 13] with actual condition numbers. 

Set up. The adversarial Algorithm[5]produces mxn matrices Q with ttt, = 10, 000, 
n = 5 and /i = 1.5 n/m,. Figures l675l and [676l show the actual condition numbers k{SQ), 
as well as the bounds with d = .01, versus the amount of sampling c, or c = "fm. The 
left panels show 54 values of c in the interval [ri, m], while the right panels show 
n < c < 1000. The plots for the sampling were produced from 30 runs. 

Sampling with Algorithm\^ We compare the actual condition numbers, k(SQ), 
when SQ is produced by sampling with replacement in Algorithm [2l and the bound 
in Corollarv l3.10l (holding with 99.99 percent probability). Since the bound in Theo- 
rem 13.131 is identical to the one in Corollary 14.61 we plot it in Figure 16.61 

Both panels in Figure 16.51 illustrate that Corollary 13.101 gives good estimates for 
k[SQ) when the sampling amount c is above the threshold in Corollary 13.111 The 
left panel illustrates that for smaller values of c, below the threshold. Corollary 13. 101 
is too pessimistic. But Algorithm [2] can still produce well conditioned matrices SQ, 
with condition numbers k{SQ) < 10. However the failure rate is higher, as shown in 
Section 16. 2[ and Algorithm [2] tends to produce a higher proportion of rank deficient 
matrices. 

Sampling with Algorithm\3[ We compare the actual condition numbers, k{SQ), 
when SQ is produced by Bernoulli sampling of rows in Algorithm [3] and sampling 
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Fig. 6.5. Actual condition numbers, sampled with AloorithmlM and bound from Corollary \i.6\ 
with 99.99 percent probability, versus amount of sampling, c. Here m = 10,000, n = 5 and fi = 
1.5 n/m. Left panel: n < c < m. Right panel: n <c< 1000. 





Fig. 6.6. Actual condition numbers, sampled with Algorithmic '^'"•'i sampling probability 7 = 
c/m, and bound from Corollary \4.6\ with 99.99 percent probability, versus c. Here m = 10,000, 
n = 5 and fi = 1.5 n/m. Left panel: n < c < m. Right panel: n < c < 1000. 



probability 7 
probability). 

The plots are shown in Figure 
for sampling with Algorithm [5J 

Conclusions. 



c/m, and the bound in Corollary 14.61 (holding with 99.99 percent 
and the observations are exactly the same as 



1. 



For the threshold c > | ^ \n{2n/d) from CoroUary EHH the best bounds 
(when holding with 99.99 percent probability) give good estimates of the con- 
dition numbers: Theorem l3.8| /Corollarv [3.101 and Theorem 13. 131 for sampling 
with Algorithm [2j and Corollarv l4.6l for sampling with Algorithm [3] 
2. For smaller values c ^ | ^ lii{2n/S), the sampling strategies in Algorithms 
[2] and [3] still produce well conditioned matrices, but at a higher failure rate, 
that is, they generate more rank deficient matrices. 
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